AITopics | document reader

Collaborating Authors

document reader

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AskSport: Web Application for Sports Question-Answering

Onofre, Enzo B, Moraes, Leonardo M P, Aguiar, Cristina D

arXiv.org Artificial IntelligenceMar-26-2025

This paper introduces AskSport, a question-answering web application about sports. It allows users to ask questions using natural language and retrieve the three most relevant answers, including related information and documents. The paper describes the characteristics and functionalities of the application, including use cases demonstrating its ability to return names and numerical values. AskSport and its implementation are available for public access on HuggingFace.

artificial intelligence, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

2503.21067

Country: South America > Brazil > São Paulo (0.05)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.92)

Add feedback

MathReader : Text-to-Speech for Mathematical Documents

Hyeon, Sieun, Jung, Kyudan, Kim, Nam-Joon, Ryu, Hyun Gon, Do, Jaeyoung

arXiv.org Artificial IntelligenceJan-13-2025

TTS (Text-to-Speech) document reader from Microsoft, Adobe, Apple, and OpenAI have been serviced worldwide. They provide relatively good TTS results for general plain text, but sometimes skip contents or provide unsatisfactory results for mathematical expressions. This is because most modern academic papers are written in LaTeX, and when LaTeX formulas are compiled, they are rendered as distinctive text forms within the document. However, traditional TTS document readers output only the text as it is recognized, without considering the mathematical meaning of the formulas. To address this issue, we propose MathReader, which effectively integrates OCR, a fine-tuned T5 model, and TTS. MathReader demonstrated a lower Word Error Rate (WER) than existing TTS document readers, such as Microsoft Edge and Adobe Acrobat, when processing documents containing mathematical formulas. MathReader reduced the WER from 0.510 to 0.281 compared to Microsoft Edge, and from 0.617 to 0.281 compared to Adobe Acrobat. This will significantly contribute to alleviating the inconvenience faced by users who want to listen to documents, especially those who are visually impaired. The code is available at https://github.com/hyeonsieun/MathReader.

formula, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2501.07088

Country: North America > United States (0.15)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.37)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

Envisioning the Next-Gen Document Reader

Yeh, Catherine, Lipka, Nedim, Dernoncourt, Franck

arXiv.org Artificial IntelligenceFeb-15-2023

People read digital documents on a daily basis to share, exchange, and understand information in electronic settings. However, current document readers create a static, isolated reading experience, which does not support users' goals of gaining more knowledge and performing additional tasks through document interaction. In this work, we present our vision for the next-gen document reader that strives to enhance user understanding and create a more connected, trustworthy information experience. We describe 18 NLP-powered features to add to existing document readers and propose a novel plug-in marketplace that allows users to further customize their reading experience, as demonstrated through 3 exploratory UI prototypes available at https://github.com/catherinesyeh/nextgen-prototypes

artificial intelligence, document reader, natural language, (17 more...)

arXiv.org Artificial Intelligence

2302.07492

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Indonesia > Sumatra (0.05)
(4 more...)

Genre: Research Report (0.40)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications (0.94)
Information Technology > Information Management > Search (0.68)

Add feedback

Scholastic: Graphical Human-Al Collaboration for Inductive and Interpretive Text Analysis

Hong, Matt-Heun, Marsh, Lauren A., Feuston, Jessica L., Ruppert, Janet, Brubaker, Jed R., Szafir, Danielle Albers

arXiv.org Artificial IntelligenceAug-12-2022

Interpretive scholars generate knowledge from text corpora by manually sampling documents, applying codes, and refining and collating codes into categories until meaningful themes emerge. Given a large corpus, machine learning could help scale this data sampling and analysis, but prior research shows that experts are generally concerned about algorithms potentially disrupting or driving interpretive scholarship. We take a human-centered design approach to addressing concerns around machine-assisted interpretive research to build Scholastic, which incorporates a machine-in-the-loop clustering algorithm to scaffold interpretive text analysis. As a scholar applies codes to documents and refines them, the resulting coding schema serves as structured metadata which constrains hierarchical document and word clusters inferred from the corpus. Interactive visualizations of these clusters can help scholars strategically sample documents further toward insights. Scholastic demonstrates how human-centered algorithm design and visualizations employing familiar metaphors can support inductive and interpretive research methodologies through interactive topic modeling and document clustering.

category, code and category, scholastic, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3526113.3545681

2208.06133

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry: Education (0.48)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback